Benchmark Implementation

For this project, you can use any algorithm of your choosing to solve the task. You are strongly encouraged to do your own research, to devise your own approach towards solving this problem.

## Some Hints

Since you're well on your way to mastering deep reinforcement learning, we won't provide too many hints for this project. That said, in our solution code, we decided to start with the DDPG code to solve this project. To adapt it to train multiple agents, we first noted that each agent receives its own, local observation. Thus, we can easily adapt the code to simultaneously train both agents through self-play. In our case, each agent used the same actor network to select actions, and the experience was added to a shared replay buffer.

But we'll leave the rest of the details up to you to discover! :)

## Note

Note that due to the multi-agent nature of this problem, you are likely to experience a bit of instability during training. For instance, we have plotted the scores plot from the solution code below. The blue line shows the maximum score over both agents, for each episode, and the orange line shows the average score (after taking the maximum over both agents) over the next 100 episodes.

Note that the agents perform horribly starting around episode 2500 and show no evidence of recovery. However, at one point, we accomplished an average score (over 100 episodes) of +0.9148!